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Abstract 

In the real world, experimental data are rarely, if ever, distributed as a normal (Gaussian) distribu- 
tion. As an example, a large set of data — such as the cross sections for particle scattering as a function 
of energy contained in the archives of the Particle Data GroupfT] — is a compendium of all published 
data, and hence, unscreened. Inspection of similar data sets quickly shows that, for many reasons, these 
data sets have many outliers — points well beyond what is expected from a normal distribution — thus 
ruling out the use of conventional \ 2 techniques. This note suggests an adaptive algorithm that allows a 
phenomenologist to apply to the data sample a sieve whose mesh is coarse enough to let the background 
fall through, but fine enough to retain the preponderance of the signal, thus sifting the data. A prescrip- 
tion is given for finding a robust estimate of the best-fit model parameters in the presence of a noisy 
background, together with a robust estimate of the model parameter errors, as well as a determination of 
the goodness-of-fit of the data to the theoretical hypothesis. Extensive computer simulations are carried 
out to test the algorithm for both its accuracy and stability under varying background conditions. 



1 Introduction 



In an idealized world where all of the data follow a normal (Gaussian) distribution, the use of the \ 2 likelihood 
technique, through minimization of x 2 , described in detail in lA.2l offers a powerful statistical analysis tool 
when fitting models to a data sample. It allows the phenomenologist to conclude either: 

• The model is accepted, based on the value of its Xmin- ^ certainly fits well when Xmim wnen compared 
to v, the numbers of degrees of freedom, has a reasonably high probability (x m i n ~ v )- On ^ ne °th er 
hand, it might be accepted with a much poorer Xmin' depending on the phenomenologist 's judgment. 
In any event, the goodness-of-fit of the data to the model is known and an informed judgment can be 
made. 

• Its parameter errors are such that a change of Ax 2 = 1 from x m in corresponds to changing a parameter 
by its standard error a. These errors and their correlations are summarized in the standard covariance 
matrix C discussed in Appendix IA. 21 

or 

• The model is rejected, because the probability that the data set fits the model is too low, i.e., Xmin 
v. 

This decision-making capability (of accepting or rejecting the model) is of primary importance, as is the 
ability to estimate the parameter errors and their correlations. 

Unfortunately, in the real world, experimental data sets are at best only approximately Gaussian and often 
are riddled with outliers — points far off from a best fit curve to the data, being many standard deviations 
away. This can be due to many sources, copying errors, bad measurements, wrong calibrations, misassignment 
of experimental errors, etc. It is this world that our note wishes to address — a world with many data points, 
and perhaps, many different experiments from many different experimenters, with possibly a significant 
number of outliers. 

In Section [21 we will propose our "Sieve" algorithm, an adaptive technique for discarding outliers while 
retaining the vast majority of the good data. This then allows us to estimate the goodness-of fit and make 
a robust determination of both the parameters and their errors — for a discussion of the term "robust" , see 
Appendix El In essence, we then retain all of the statistical benefits of the conventional x 2 technique. 

In Sections 13. 7. II and 13. 7. 21 we will apply the algorithm to high energy pp and pp scattering, as well as to 
ii~p and 7t + p scattering. Eight examples of real world experimental data, for both pp and pp scattering and 
n + p and ir~p scattering, are taken from the Particle Data Group archives[Ij and are illustrated in Figures^ 
121 and 01 respectively. The data in Fig. ^ are all of the known published data for the total cross sections 
<7pp and a pp for cms (center of mass) energies greater than 6 GeV. The measured pp p and p pp , where p is the 
ratio of the real to the imaginary portion of the forward scattering amplitude, are shown in Fig. [21 again 
for cms energies greater that 6 GeV. The data in Fig. 13 are all of the known published data for the total 
cross sections <J^- p and u^+ p for cms energies greater that 6 GeV. The measured p^- p and p^+p are shown 
in Fig. again for cms energies greater that 6 GeV. Detailed examination of Figures ^ [21 El and 0] show 
many points far off of the common trend, often at the same energy. Attempts to use the x 2 technique to fit 
these data with a model will always come up short. These fits will always return a huge value of Xmin/^' 
together with model parameters that are likely to be unreliable. 

In Section [21 we make three types of computer simulations, generating data normally distributed about 
a straight line, a constant, and about a parabola, along with outliers — artificial worlds where we know all 
of the answers, i.e., which points are signal and which are noise. Examples for the straight line, a constant 
(two cases) and the parabola are shown in Fig. EO HI and [HI Details are given in Sections 13.11 13.31 and 13.61 
The noise points in Fig. 01, E^, Ct, and Fig. [Ht are the diamonds, whereas the signal points are the circles. 

The dashed curve in Fig. [SJd is the result of a x 2 fit to all of the noisy data (100 signal plus 20 noise 
points) in Figure[I}]a, and is not a very good fit to the data. The solid line is the fit with the " Sieve" algorithm 
proposed in the next Section. It reproduces nicely the theoretical straight line y — 1 — 2x that was used to 
computer-generate data that were normally distributed about it, using random numbers. In this case, the 
20 noise points penetrated the signal down to a level Axf > 6. 
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In Fig. HJ) we show the results for fitting the constant y = 10. The noise points (diamonds) in Fig. 
penetrate the signal down to Axf > 4. 

In Fig. 03 we show the results for fitting the constant y = 10, where the noise points (diamonds) in Fig. 
penetrate the signal down to Axf > 9. 

In Fig. |Ht the data were generated about the parabola y = 1 + 2x + 0.5a; 2 , with background noise. Figure 
IHt shows the result of sifting the data according to our Sieve algorithm, described below. The noise points 
that are retained after invoking our algorithm are the diamonds in Fig. |SJd and the circles are the signal 
points that are retained. 

In Sections 13.21 and 13.31 we will calibrate the algorithm with extensive computer-generated numerical 
simulations and test it for stability and accuracy. The lessons learned from these computer simulations of 
events are summarized in Section 13.41 

Finally, in Appendix ^ we give mathematical details about fitting data using the robust A 2 (Lorentzian) 
maximum likelihood estimator that we employ in our "Sieve" algorithm and in particular, A 2 ,, which mini- 
mizes the rms (root mean square) widths of the parameter distributions, making them essentially the same 
as the rms distributions of a \ 2 fit- We also discuss fitting data with the more conventional \ 2 maximum 
likelihood estimator. 

2 The Adaptive Sieve Algorithm 
2.1 Major assumptions 

Our major assumptions about the experimental data are: 

1. The experimental data can be fitted by a model which successfully describes the data. 

2. The signal data are Gaussianly distributed, with Gaussian errors. 

3. That we have "outliers" only, so that the background consists only of points "far away" from the true 
signal. 

4. The noise data, i.e. the outliers, do not completely swamp the signal data. 



2.2 Algorithmic steps 

We now outline our adaptive Sieve algorithm, consisting of several steps: 

1. Make a robust fit (see Appendix El of all of the data (presumed outliers and all) by minimizing Aq, 
the Lorentzian squared, defined as 

N 

A 2 (a; x) ee ln {1 + 0.18A X 2 (x i; a)} , (1) 

i=l 



described in detail in the Appendix I A. 41 The Af-dimensional parameter space of the fit is given by 
a = {a%, . . . , Q.m}i x = {xi, ■ ■ ■ , xn} represents the abscissa of the N experimental measurements 

y = {yi, . . . , j/at} that are being fit and Ax 2 (xi'i a) = ^ Vi^M^i>£Ll \ ; where y{xi\ a) is the theoretical 
value at Xi and <Ji is the experimental error. As discussed in Appendix I A. 41 minimizing Aq gives the 
same total Xmin = J2i=i ^Xi( x i',ct) from eq. JU as that found in a \ 2 fit, as well as rms widths (errors) 
for the parameters — for Gaussianly distributed data — that are almost the same as those found in a x 2 
fit. The quantitative measure of "far away" from the true signal, i.e., point i is an outlier corresponding 

to Assumption J3J, is the magnitude of its Ax 2 (xi; a) = ^Ui^M^Ei^l^j 

If Xmin is satisfactory, make a conventional \ 2 fit to get the errors and you are finished. If \\ 
satisfactory, proceed to step |21 
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2. Using the above robust A 2 , fit as the initial estimator for the theoretical curve, evaluate Axf (xi;a), 
for each of the N experimental points. 

3. A largest cut, Axf (xf, a) max , must now be selected. For example, we might start the process with 
Axi(xi;a) max = 9. If any of the points have Axf{xf,a) > A\f (x,; a) max , reject them — they fell 
through the "Sieve". The choice of Axf (xf, a) max is an attempt to pick the largest "Sieve" size 
(largest Axf(xi; a) max ) that rejects all of the outliers, while minimizing the number of signal points 
rejected. 

4. Next, make a conventional % 2 fit to the sifted set — these data points are the ones that have been 
retained in the "Sieve". This fit is used to estimate Xmin- Since the data set has been truncated by 
eliminating the points with Axfixf, a) > Axf{xf, a) max , we must slightly renormalize the xfnin found 
to take this into account, by the factor 1Z. This effect is discussed later in detail in Section l3~4l 

If the renormalized x, 2 nin , i.e., 1Z x Xmin * s acceptable — in the conventional sense, using the \ 2 distribu- 
tion probability function — we consider the fit of the data to the model to be satisfactory and proceed 
to the next step. If the renormalized Xmin 1S n °t acceptable and Axf {xf, e*) max is not too small, we 
pick a smaller Axf a)max and go back to step EH The smallest value of Axfixi] a) max that makes 
much sense, in our opinion, is Axf(xi] a) m ax > 2. After all, one of our primary assumptions is that 
the noise doesn't swamp the signal. If it does, then we must discard the model — we can do nothing 
further with this model and data set! 

5. From the x 2 fit that was made to the "sifted" data in the preceding step, evaluate the parameters a. 
Next, evaluate the M x M covariance (squared error) matrix of the parameter space which was found 
in the x 2 fit- We find the new squared error matrix for the A 2 fit by multiplying the covariance matrix 
by the square of the factor r x 2 (for example, as shown later in Section Hi 2. 21 r x 2 ~ 1.02,1.05, 1.11 
and 1.14 for Axfixf, a) max = 9, 6, 4 and 2, respectively ). The values of r x 2 > 1 reflect the fact that 
a x 2 fit to the truncated Gaussian distribution that we obtain — after first making a robust fit — has a 
rms (root mean square) width which is somewhat greater than the rms width of the x 2 fit to the same 
untruncated distribution. Extensive computer simulations, summarized in Section 13.41 demonstrate 
that this robust method of error estimation yields accurate error estimates and error correlations, even 
in the presence of large backgrounds. 

You are now finished. The initial robust Aq fit has been used to allow the phenomenologist to find a 
sifted data set. The subsequent application of a x 2 fit to the sifted set gives stable estimates of the model 
parameters a, as well as a goodness-of-fit of the data to the model when Xmin * s renormalized for the effect 
of truncation due to the cut Axf{xf, a) max . Model parameter errors are found when the covariance (squared 
error) matrix of the x 2 fit is multiplied by the appropriate factor (r x 2) 2 for the cut Axf(xi] a) max . 

It is the combination of using both A 2 , (robust) fitting and x 2 fitting techniques on the sifted set that 
gives the Sieve algorithm its power to make both a robust estimate of the parameters a as well as a robust 
estimate of their errors, along with an estimate of the goodness-of-fit. 

Using this same sifted data set, you might then try to fit to a different theoretical model and find Xmin 
for this second model. Now one can compare the probability of each model in a meaningful way, by using 
the x 2 probability distribution function of the numbers of degrees of freedom for each of the models. If the 
second model had a very unlikely Xmin; it could now be eliminated. In any event, the model maker would 
now have an objective comparison of the probabilities of the two models. 

2.3 Evaluating the Sieve algorithm 

We will give two separate types of examples which illustrate the Sieve algorithm. In the first type, we 
computer-generated data, normally distributed about 

• a straight line, along with random noise to provide outliers, 

• a constant, along with random noise to provide outliers, 

• a parabola, with background noise normally distributed about a slightly different parabola, 
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the details of which are described below. The advantage here, of course, is that we know which points are 
signal and which points are noise. 

For our real world example, we took four types of experimental data for elementary particle scattering 
from the archives of the Particle Data Group £Q. For all energies above 6 GeV, we took total cross sections 
and p-values and made a fit to these data. These were all published data points and the entire sample was 
used in our fit. We then made separate fits to 

• pp and pp total cross sections and p-values, 

• ir~p and n + p total cross sections a and p-values, 
using eqns. J7J), © and I© below. 

3 Studies using large computer-generated data sets 

Extensive computer simulations were made using the straight line model yi = 1 — 2xi and the constant model 
Hi = 10. Over 500,000 events were computer-generated, with normal distributions of 100 signal points per 
event, some with no noise and others with 20% and 40% noise added, in order to investigate the accuracy 
and stability of the "Sieve" algorithm. The cuts A\f > 9, 6, 4 and 2 were investigated in detail. 

3.1 A straight line model 

An event consisted of generating 100 signal points plus either 20 or 40 background points, for a total of 
120 or 140 points, depending on the background level desired. Let RND be a random number, uniformly 
distributed from to 1. Using random number generators, the first 100 points used Xi = 10 x RND, where i 
is the point number. This gives a signal randomly distributed between x = and x — 10. For each point x iy 
a theoretical value y~i was found using ?/,• = 1 — 2a;,. Next, the value of Cj, the "experimental error", i.e, the 
error bar assigned to point i, was generated as o% = a; + on x RND. Using these (Tj, the y^s were generated, 
normally distributed^] about the value of y~i For i = 1 to 50, a; = 0.2, cti = 1.5, and for i = 51 to 100, 
a, = 0.2, on — 3. This sample of 100 points made up the signal. 

The 40 noise points, i = 101 to 140 were generated as follows. Each point was assigned an "experimental 
error" <ji = di+ctiX RND. The Xi were generated as x% = di+SiX RND. In order to provide outliers, the value 
of yi was fixed at yi = 1 — 2xi + / cut x Sign; x (pi +ft ) x cr, and the points were then placed at this fixed value of 
yi and given the "experimental error" cr,. The parameter / cut depended only on the value of A%f (o^; a) max 
that was chosen, being 1.9, 2.8, 3.4 or 4, for A%f (a:,; a) max = 2, 4, 6 or 9, respectively, and was independent 
of i. These choices of / cut made outliers that only existed for values of A%? (xf, a) > A%f (xf, a) max - 

For i = 101 to 116, <U = 0, $ = 10, a* = 0.75, a* = 0.5, h = 1.0, ft = 0.6. To make "doubles" at the 
same X4 as a signal point, if yi-100 > 1 — 2xi_ioo we pick Sign^ = +1; otherwise Sign^ = —1, so that the 
outlier is on the same side of the reference line 1 — 2 Xi clS IS the signal point. 

For i = 117 to 128, di = 0, Si = 10, a, = 0.5, ctj = 0.5, 6, = 1.0, ft = 0.6; Sign, was randomly chosen as 
+1 or -1. This generates outliers randomly distributed above and below the reference line, with x.- L randomly 
distributed from to 10. 

For i = 129 to 140, di = 8, Si = 2, a, = 0.5, a, = 0.5, h = 1.0, ft = 0.6; Sign; = +1. This makes points 
in a "corner" of the plot, since xi is now randomly distributed at the "edge" of the plot, between 8 and 10. 
Further, all of this points are above the line, since Sign^ is fixed at +1, giving these points a large lever arm 
in the fit. 

For the events generated with 20 noise points, the above recipes for background were simply halved. An 
example of such an event containing 120 points, for which Axf(xi; «) max = 6, is shown in Fig. [SJi, with the 
100 squares being the normally distributed data and the 20 circles being the noise data. 

After a robust fit to the entire 120 points, the sifted data set retained 100 points after the Axf > 6 
condition was applied. This fit had Xmin = 88.69, with an expected x 2 = v = 98, giving Xmin/' 7 = 0.905. 
Using a renormalization factor TZ = 1/0.901, we get a renormalized Xmin/ 11 = 1-01 — see Section I5~4l for details 
of the renormalization factor. After using the Sieve algorithm, by minimizing x 2 f° r the sifted set, we found 
that the best-fit straight line, y =< a > + < b > x, had < a >= 0.998 ± 0.12 and < b >= -2.014 ± 0.020. 
The parameter errors given above come from multiplying the errors found in a conventional x 2 fit to the 



- 4 - 



sifted data by the factor r x 2 = 1.05 — for details see Section l3~4l This turns out to be a high probability fit[2] 
with a probability of 0.48 (since the renormalized Xmin/ 1 ' = 1-01, whereas we expect < x 2 l v >= 1.0 ± 0.14). 

Figure [5Jd shows the results after the use of the Sieve procedure with Ax| max = 6. Of the original 120 
points, all 100 of the signal points were retained (squares), while no noise points (diamonds) were retained. 
The solid line is the best % 2 fit, y = 0.998 - 2.014a;. 

Had we applied a % 2 minimization to original 120 point data set, we would have found \ 2 = 570, which 
has infinitesimal statistical probability. The straight line resulting from that fit, y = 0.925 — 1.98a;, is also 
shown in Fig. \5]p as the dot-dashed curve. For large x, it tends to overestimate the true values. 

To investigate the stability of our procedure with respect to our choice of A% 2 , we reanalyzed the full 
data set for the cut-off, A% 2 max = 4. The evaluation of the parameters a and b was completely stable, 
essentially independent of the choice of Axf ■ The robustness of this procedure on this particular data set is 
evident. 

3.2 Distributional widths for the straight line model 

We now generate extensive computer simulations of data sets resulting from the straight line yt — l — 2xi using 
the recipe of Section I5TT1 with and without outliers, in order to test the Sieve algorithm. We have generated 
50,000 events with 20% background and 50,000 events with 40% background, for each cut Ax 2 max = 9, 6, 4 
and 2. We also generated 100,000 Gaussianly distributed events with no noise. 

3.2.1 Case 1 

We generated 100,000 Gaussianly distributed events with no noise. Let a and b be the intercept and slope 
of the straight line y = 1 — 2x and define < a > as the average a, < b > as the average b found for the 
100,000 straight-line events, each generated with 100 data points, using both a A 2 , (robust) fit and a x 2 fit- 
The purpose of this exercise was to find r(Ag), the ratio of the Aq rms parameter width ct(Aq) divided by 
E, the parameter error from the x 2 fit, i.e. 

as well as demonstrate that there were no biases (offsets) in parameter determinations found in A 2 and x 2 
fits. 

The measured offsets 1— < a x 2 >, 1— < a\2 >, —2— < 6 X 2 > and —2— < b\2 > were all numerically 
compatible with zero, as expected, indicating that the parameter expectations were not biased. 

Let a be the rms width of a parameter distribution and E the error from the x 2 covariant matrix. We 
found: 

a a {x 2 ) = 0.139 ± 0.002 and E a = 0.138 
a b {x 2 ) = 0.0261 ±0.003 and E h = 0.0241, 

showing that the rms widths a and parameter errors S were the same for the x 2 fit, as expected. Further, 
the width ratios r for the Aq fit are given by 

r a {Al) = 1.034 ±0.010 
r b {Al) = 1.029 ±0.011, 

demonstrating that: 

• the r's of the Aq are almost as good as that of the x 2 distribution, r(x 2 ) = 1. 

• the ratios of the rms A 2 width to the rms x 2 width for both parameters a and b are the same, i.e., we 
can now simply write 

r A » = ^ ~ 1-03. (2) 
Finally, we find that 1— < x 2 l v >— 0.00034 ± 0.00044, which is approximately zero, as expected. 
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3.2.2 Case 2 



For Case 2, we investigate data generated with 20% and 40% noise that have been subjected to the adaptive 
Sieve algorithm, i.e. the sifted data after cuts of ^Xi( x i> a ) max = 9, 6, 4 and 2. We investigated this 
truncated sample to measure possible biases and to obtain numerical values for r's. 

We generated 50,000 events, each with 100 points normally distributed and with either 20 or 40 outliers, 
for each cut. A robust fit was made to the entire sample (either 120 or 140 points) and we sifted the data, 
rejecting all points with either Ax 2 (xi',ct) > 9, 6, 4 and 2, according to how the data were generated. A 
conventional x 2 analysis was then made to the sifted data. The results are summarized in Table As 
before, we found that the widths from the \ 2 fit were slightly smaller than the widths from a robust fit, so 
we adopted only the results for the % 2 fit. 

There were negligible offsets 1— < a > and —2— < b >, being ~ 1 to 5% of the relevant rms widths, u a 
and Ob, for both the robust and \ 2 fits. 

In any individual x 2 fit to the jth data set, one measures dj, 6j, X a . , £&. and {Xmm/ l ')j- Thus, we 
characterize all of our computer simulations in terms of these 7 observables. 

We again find that the r x 2 values — defined as c/E — are the same, whether we are measuring a or b. 
They are given by r x 2 = cr/E = 1.034, 1.054, 1.098 and 1.162 for the cuts A X 2 {x l ; a) max = 9, 6, 4 and 2, 
respectively Further, they are the same for 20% noise and 40% noise, since the cuts rejected all of the 
noise points. In addition, the r values were found to be the same as the r values for the case of truncated 
pure signal, using the same A^ 2 ^; a) max cuts. The signal retained was 99.7, 98.57, 95.5 and 84.3 % for the 
cuts AXi(xi; «) max = 9, 6, 4 and 2, respectively — see Section l3~4l and eq. © for theoretical values of the 
amount of signal retained. 

We experimentally determine the rms widths a (the errors of the parameter) by multiplying the r value, 
a known quantity independent of the particular event, by the appropriate E which is measured for that event, 
i.e., 

a a = E a x r x 2 
a b = E b x r x 2. 

The rms widths are now determined for any particular data set by multiplying the known factors r x 2 by the 
appropriate E found (measured) from the covariant matrix of the x 2 fit of that data set. 

Also shown in Table H are the values of Xmin/ 1 ' found for the various cuts. We will compare these results 
later with those for the constant case, in Section l3~3l 

We again see that a sensible approach for data analysis-even where there are large backgrounds of 
~ 40% — is to use the parameter estimates for a and b from the truncated x 2 fit and assign their errors as 

a a — r x iYi a 

u b =r x 2E b , (3) 

where r v 2 is a function of the Ay- , cut utilized. Before estimating the goodness-of-fit, we must renormalize 

the observed Xmin/ 1 ' by the appropriate numerical factor for the Axf max cut used. 

This strategy of using an adaptive A\ 2 (xi; a) cut minimizes the error assignments, guarantees robust 
fit parameters with no significant bias and also returns a goodness-of-fit estimate. 

3.3 The constant model, = 10 

For this case, we investigate a different theoretical model (j/j = 10) with a different background distribution, 
to measure the values of r % 2 and < Xmin/^ >• 

An event consisted of generating 100 signal points plus either 20 or 40 background points, for a total 
of 120 or 140 points, depending on the background level desired. Again, let RND be a random number, 
uniformly distributed from to 1. Using random number generators, for the first 100 points i, a theoretical 
value y~i — 10 was chosen. Next, the value of <7j, the "experimental error" , i.e, the error bar assigned to point 
i, was generated as cr, = a, + x RND. Using these <7j, the y^s were generated, normally distributed^] 
about the value of yi = 10 . For i = 1 to 50, dj = 0.2, = 1.5, and for i = 51 to 100, a* = 0.2, a* = 3. 
This sample of 100 points made up the signal. 
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The 40 noise points, i = 101 to 140 were generated as follows. Each point was assigned an "experimental 
error" a — a* + en x RND. In order to provide outliers, the value of yi was fixed at yi = 10 + / cu t x sign; x 
{pi + f3i) x Gi and the points were then placed at this fixed value of yi and given the "experimental error" 
o i . The parameter / cut depended only on the value of Ax?(xj; a) max that was chosen, being 1.9, 2.8, 3.4 or 
4, for Axi ( x i'i a )max = 2, 4, 6 or 9, respectively, and was independent of i. 

For i — 101 to 116, a,i — 0.75, ati — 0.5, 6j = 1.0, f3i — 0.6; Sign^ was randomly chosen at +1 or -1. 

For i = 117 to 128, a* = 0.5, ctj = 0.5, hi = 1.0, (3i = 0.6; This generates outliers randomly distributed 
above and below the reference line, with Xi randomly distributed from to 10. 

For i = 129 to 140, a z = 0.5, a z = 0.5, b z = 1.0, ft = 0.6; Sign, = +1. This forces 12 points to be 
greater than 10, since Sign, is fixed at +1. For the events generated with 20 noise points, the above recipes 
for background were simply halved. 

Two examples of events with 40 background points are shown in FiguresEJi and[7|i, with the 100 squares 
being the normally distributed data and the 40 circles being the noise data. 

In Fig. 03 we show the results after using the cut A%f (xj; a) max = 4. No noise points (diamonds) were 
retained, and 98 signal points (circles) are shown. The best fit, y — 9.98 ± 0.074, is the solid line, whereas 
the dashed-dot curve is the fit to all 140 points. The observed Xmin/ 1 ' = 0-84 yields a renormalized value 
1Z x Xmin/ 11 — 1-09, in good agreement with the expected value Xmin/ 1 ' = 1 ± 0.14. If we had fit to the entire 
140 points, we would find Xmin/^ = 4.39, with the fit being the dashed-dot curve. 

In Fig. 03 we show the results after using the cut Ax?(xj;a) max = 9. No noise points (diamonds) were 
retained, and 98 signal points (circles) are shown. The best fit, y = 10.05 ± 0.074, is the solid line, whereas 
the dashed-dot curve is the fit to all 140 points. The observed Xmin/ 1 ' = 1-08 yields a renormalized value 
1Z x Xmin/ 1 ' = I'll) m good agreement with the expected value Xmin/ 11 = 1 ± 0.14. If we had fit to the 
entire 140 points, we would find Xmin/ 1 ' = 8.10, with the fit being the dashed-dot curve. The details of the 
renormalization of Xmin/ 17 ano - the assignment of the errors are given in Section l3.4l 

We computer-generated a total of 500,000 events, 50,000 events with 20% noise and an additional 50,000 
events with 40% noise, for each of the cuts Axf > 9, 6, 4 and 2, and 100,000 events with no noise. 

For the sample with no cut and no noise, we found r A 2 = 1.03 ± 0.02, equal to the value r A 2 = 1.03 that 
was found for the straight line case. 

Again, we found that our results for r x 2 were independent of background, as well as model, and only 
depended on the cut. We also found that the biases (offsets) for the constant case, (10— < a X 2 >), although 
non-zero for the noise cases, were small in comparison to a, the rms width. 

The results for cuts Axf max = 9, 6, 4 and 2 are detailed in Tabled We see in Tabled compared with 
the straight line results of Section 13.2.21 that the r x 2 values for the constant case are essentially identical, as 
expected. Further, we find the same results for the values of Xmin/ 1 ' as a function of the cut Axf(xi; a) . 

3.4 Lessons learned from computer studies of a straight line model and a con- 
stant model 

• As found in Sections 13.2.21 and 13.31 and detailed in Table ^ we have universal values of r x 2 and 
< Xmin > l v i as a function of the cut Axf (xjj a) , independent of both background and model. 

• A sensible conservative approach for large backgrounds (less than or the order 40%) is to use the 
parameter estimates from the x 2 fit to the sifted data and assign the parameter errors to the fitted 
robust parameters as 

"■(X 2 ) = r x 2 x S > 

where r x 2 is a function of the cut Ax 2 (a^i; a) max , given by the average of the straight line and constant 
cases of Tabled This strategy gives us a minimum parameter error, with only very small biases to the 
parameter estimates. 

• We must then renormalize the value found for Xmin/ 17 by the appropriate averaged value of < x m in > l v 
for the straight line and constant case, again as a function of the cut Axf (xil «) max - 
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Let us define A as the A%f max cut and 1Z as the renormalization factor that multiplies Xmin/ P - 

We find from inspection of Cases 1 to 2 for the straight line and of Section 13.31 for the case of the 
constant fit that a best fit parameterization of r x 2 , valid for A > 2 is given by 

r x 2 = 1 + 0.246e~ - 263A (4) 
We note that TZr 1 , for large v, is given analytically by 

K- 1 = / _x 2 e- x / 2 dx/ _e- x ' 2 dx 
2 e~ A / 2 

' ' (5) 



Graphical representations of r x 2 and IZr 1 are shown in Figures^ andEJ), respectively. Some numerical 
values are given in Table ^ and are compared to the computer-generated values found numerically for 
the straight line and constant cases. The agreement is excellent. 

Let us define o~q as the rms parameter width that we would have had for a \ 2 fit to the uncut sample, 
where the sample had had no background, and define So the error found from the covariant matrix. 
They are, of course, equal to each other, as well as being the smallest error possible. We note that the 
ratio ct/cto = r x 2 x E/Eo- This ratio is a function of the cut A through both r x 2 and E, since for a 
truncated distribution, E/Eo depends inversely on the square root of the fraction of signal points that 
survive the cut A. In particular, the survival fraction S.F. is given by 



S.F. 



+ \/A 1 

-^e- x I 2 dx = erf (y/A/2) (6) 
-v/A V2tt 



and is 99.73, 98.57, 95.45 and 84.27 % for the cuts A = 9, 6, 4 and 2, respectively. The survival fraction 
S.F. is shown in TableQas a function of the cut Ax 2 max , as well as is the ratio a/ag. We note that the 
true cost of truncating a Gaussian distribution, i.e., the enlargement of the error due to truncation, is 



vi r 

2. This rapid loss of accuracy is why the errors become intolerable for cuts Axjf smaller than 2. 



not r x 2, but rather r x 2 /V S.F., which ranges from ~ 1.02 to 1.25 when the cut Axf max goes from 9 to 



3.5 Fitting strategy 

We find that an effective strategy for eliminating noise and making robust parameter estimates, together 
with robust error assignments, is: 

1. Make an initial Aq fit to the entire data sample. If Xmin/^ 1S satisfactory, then make a standard \ 2 fit 
to the data and you are finished. If not, then proceed to the next step. 

2. Pick a large value of Ax 2 (a^; a) roax , e.g., Ax 2 (xi; a) max = 9. 

3. Obtain a sifted sample by throwing away all points with Ax 2 (xi; a) > A%|(xi; a) ma x- 

4. Make a conventional x 2 fit to the sifted sample. For your choice of Axf(xi; a) ma x, find TlT 1 from eq. JSJ. 
If the renormalized value 1Z x Xmin/ 11 1S sufficiently near 1, i.e., the goodness-of-fit is satisfactory, then 
go to the next step. If, on the other hand, IZx Xmin/^ i s too large, pick a smaller value of Ax 2 (xj; a) max 
and go to step[3](for example, if you had used a cut of 9, now pick Axf(xi; a) max = 6 and start again). 
Finally, if you reach Axf(xi; a) max = 2 and you still don't have success, quit — the background has 
penetrated too much into the signal for the "Sieve" algorithm to work properly. 
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5. a) Use the parameter estimates found from the Axf (xf, a) max fit in the previous step. 

b) Find a new squared error matrix by multiplying the covariant matrix C found in the x 2 fit by (r x 2) 2 . 
Use the value of r x 2 found in eq. for the chosen value of the cut Axf(xi; a) max to obtain a robust 
error estimate essentially independent of background distribution. 

You are now finished. You have made a robust determination of the parameters, their errors and 
the goodness-of-fit. 

The renormalization factors 1Z are only used in estimating the value of the goodness-of-fit, where small 
changes in this value are not very important. Indeed, it hardly matters if the estimated renormalized x 2 l v 
is between 1.00 and 1.01 — the possible variation of the expected renormalized % 2 /v due to the two different 
background distributions. After all, it is a subjective judgment call on the part of the phenomenologist 
as to whether the goodness-of-fit is satisfactory. For large v, only when x 2 /v starts approaching 1.5 does 
one really begin to start worrying about the model. For v ~ 100, the error expected in x 2 l v is ~ 0.14, so 
uncertainties in the renormalized x 2 l v °f the order of several percent play no critical role. The accuracy of 
the renormalized values is perfectly adequate for the purpose of judging whether to keep or discard a model. 

In summary, extensive computer simulations for sifted data sets show that by combining the x 2 parameter 
determinations with the corrected covariance matrix from the x 2 fit, we obtain also a "robust" estimate of the 
errors, basically independent of both the background distribution and the model. Further, the renormalized 
xtim/v i s a good predictor of the goodness-of-fit. Having to make a Aq fit to sift the data and then a x 2 fit 
to the sifted data is a small computing cost to pay compared to the ability to make accurate predictions. 
Clearly, if the data are not badly contaminated with outliers, e.g., if a Axj(xi; a) max = 6 fit is satisfactory, 
the additional penalty paid is that the errors are enlarged by a factor of ~ 1.06 (see Tabled, which is not 
unreasonable to rescue a data set. Finally, if you are not happy about the error determinations, you can use 
the parameter estimates you have found to make Monte Carlo simulations of your model[7]. By repeating a 
Aq fit to the simulated distributions and then sifting them to the same value of Ax 2 (xi; a) max as was used 
in the initial determination of the parameters, and finally, by making a x 2 fit to the simulated sifted set you 
can make an error determination based on the spread in the parameters found from the simulated data sets. 

3.6 The parabola 

As a final example of computer-generated data, we generated one noisy data set using a parabolic model. 
A total of 135 points were generated by computer. Using random number generators, the first 50 points 
generated picked x^s distributed randomly 3 from to 10. For each point X{, a theoretical value f/j was found 
using y~i = l + 2xi + 0.5x 2 . Next, the value of cr^ , the "experimental error" , i.e, the error bar assigned to point 
i, was generated randomly on the interval 0.2 to 2.7. The y,'s were then generated, normally distributed^ 
about the value of yi using the <Tj that had been previously found. The next 50 points were chosen in the 
same manner, except that these <ii were randomly distributed between 0.2 and 5.2. This sample of 100 points 
made up the signal. 

The 35 noise points were generated around a "nearby" parabola, given by y~i = 12 + 2xi + 0.2xj. The 
first 15 points had their Xi again randomly generated in the interval to 10. The error bars assigned to 
each point were randomly distributed in the interval 0.2 to 5.2. To provide the outliers, the value of the 
theoretical y~i was found using a new parabola y~i = 12 + Ixj, + 0.2x 2 . These points were then normally 
distributed using oVs uniformly distributed in the interval 0.8 to 20.8. The next 20 were generated in the 
same fashion, except that the error bars were uniformly distributed in the interval 0.2 to 8.2 and the yi 
values normally distributed with o^'s in the interval 1.6 to 65.6. In this case, we not only made "outliers", 
but also contaminated the sample with substantial "inliers" , since we used a "nearby parabola" to generate 
the background data. Of course, this violates our Assumption that we only have outliers, but gives us a 
feeling of what happens if substantial amounts of "inliers" are also present. 

The resulting distribution of 135 points is shown in Fig. [SJi, with the 100 squares being the normally 
distributed data and the 35 circles being the noise data. 

The sifted data set, shown in Fig. |SJd, retained 113 points after the Ax 2 max = 6 condition was applied to 
the original 135 points. At that point, we made both a conventional x 2 fit to the sifted data set in order to 
evaluate the parameters, their errors and the goodness of fit. The x 2 fit to the sifted data had Xmin = 123.6, 
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with v = 110, giving Xmin/^ = 1-12. Renormalizing using TZ found from eq. JSJ , we get the corrected 
1Z x Xmm/ 1 ' = 1-24, whereas we expect 1 ± 0.13. This is a reasonable fit [21 with a probability of ~ 0.06. 
After using the Sieve algorithm, by minimizing % 2 , we found that the best-fit parabola, y = c$ + c\x + C2X 2 , 
had Co = 1.18 ± 0.23 and c\ — 2.05 ± 0.05 and c 2 — 0.489 ± 0.005, where the errors have been renormalized 
by the factor r x 2 = 1.05 found from eq. (@J. 

Figure^ shows the results of using the Sieve procedure with the cut A% 2 (xi\ ex.) = 6. Of the original 
135 points, all 100 of the signal points were retained (squares). There were 13 noise points (circles) also 
retained, all very close to the fitted straight line. These points are the "inliers" that resulted from the 
background generation using the "nearby parabola" , violating our primary assumption that there are only 
"outliers" as background. Thus, it is of great interest to see how well the Sieve procedure worked. 

Had we applied a x 2 minimization to original 130 point data set, we would have found Xmin/ 1 ' = 19.93, 
which clearly has infinitesimal statistical probability. The parabola resulting from this x 2 fit is also shown 
111 Fig. IHb- It clearly misses many of the data points in the sifted set. 

When we fitted the parabola to only the 100 signal points, with no noise included, we got the parameters: 
Co = 0.97±0.21, ci = 2.13±0.05 and C2 = 0.480±0.005, using a conventional % 2 fit. These parameters, within 
errors the same as those found using the "Sieve" algorithm, give a curve that is essentially indistinguishable 
from the solid line in Fig. |Hb obtained using the Sieve algorithm. We note that even when the background 
produces some "inliers", i.e., the cut Ax 2 max does not remove all of the background, the Sieve procedure is 
still very useful. 

Finally, our procedure was completely stable for reasonable choices of A% 2 , giving essentially the same 
answer for Axf > 4, 6 or 9. Thus, even in the presence of ~ 13% "inliers" , the answer after using the "Sieve" 
was reasonable. The parameter values are relatively unaffected, as are the errors. The main concern is the 
higher corrected Xmin/ 1 ' that is due to the background points that are close to the true signal and thus can 
not be "Sieved" out. However, this only affects the goodness-of-fit estimate, making X%vn.l v somewhat larger. 
In the end, the conclusion as to whether to accept the model or reject it on the basis of the goodness-of-fit 
estimate is a subjective judgment of the phenomenologist. Many models have been accepted when the x 2 
probability has been as low as a few tenths of a percent. 



3.7 Real World data 

We will illustrate the Sieve algorithm by simultaneously fitting all of the published experimental data above 
t/s > 6 GeV for both the total cross sections a and p values for pp and pp scattering, as well as for ir~p 
and Tr + p scattering. The p value is the ratio of the real to the imaginary forward scattering amplitude and 
a/s is the cms energy E cms . The data sets used have been taken from the Web site of the Particle Data 
Group[IJ and have not been modified. They provide the energy (xi), the measurement value (yi) and the 
experimental error(cri), assumed to be a standard deviation, for each experimental point. 

Testing the hypothesis that the cross sections rise asymptotically as In 2 s, as s — > 00, the four functions 
o~ and p^- that we will simultaneously fit for ^fs > 6 GeV are: 



± = c + Cl lnfii) + c 2 ln 2 n +/ Vn" 1 ±^-) , (7) 



P ± = — -c 1+ c 27 rln - -/3^cothr - +— /+ ±<5tan — - , 8 

a ± [2 \mJ 2 \mJ v 2 \mJ J 



da ± { 1 1 f21n(Wm)'| , 

c i T^JZa + C2 l)(^/m)^ 2 } 



d(v/m) \{y/m) J (_ (yjrn) 

±5{(a~l)(v/mr~ 2 }, (9) 

where the upper sign is for pp (ir + p) and the lower sign is for pp (ir~p) scattering 0. Here, v is the laboratory 
energy of the projectile particle and m is the proton (pion) mass. The exponents p, and a are real, as are 
the 6 constants cq, c±, c%, ft-pr, 6 and the dispersion relation subtraction constant /+(0). We set p = 0.5, 
appropriate for a Regge-descending trajectory, leaving us 7 parameters. We then require the fit to be 
anchored by the experimental values of Op v and a pp (a^-p an d o'tt+p); as we U as their slopes, d ^ m ^ , at 
= 4 GeV for nucleon scattering and ^/s = 2.6 GeV for pion scattering. This in turn imposes 4 conditions 
on the above equations and we thus have three free parameters to fit: ci, C2 and /+(0). 



- 10 - 



3.7.1 pp and pp scattering 

The raw experimental data for pp and pp scattering that are shown in Figures ^ and [5] were taken from the 
Particle Data Group|I| . Figure ^ shows the ap p and a pp data for E cms > 6 GeV, whereas Fig. [3 shows all 
of the experimental p pp and p pp data for E cms > 6 GeV. There are a total of 218 points in these 4 data sets. 
We fit these 4 data sets simultaneously using eq. Q, eq. © and eq. ©■ Before we applied the Sieve, we 
obtained Xmin = H85.6, whereas we expected 215. Clearly, either the model doesn't work or there are a 
substantial number of outliers giving very large Axf contributions. The Sieve technique shows the latter to 
be the case. 

We now study the effectiveness and stability of the Sieve. Table contains the fitted results for pp and 
pp scattering using 3 different choices of the cut-off, Axf max = 4, 6 and 9. For each Axf max cut it tabulates: 

• the fitted parameters from the x 2 fit together with the errors found in the x 2 fit, 

• the total Xmin' 

• is, the number of degrees of freedom (d.f.) after the data have been sifted by the indicated Axf cut-off. 

To get robust errors, the errors quoted in Table|21for for each parameter should be multiplied by the common 
factor r x 2=1.05, from eq. (@J, using the cut A = 6. 

We note that for Axf ma x = 6, the number of retained data points is 193, whereas we started with 218, 
giving a background of ~ 13%. We have rejected 25 outlier points (5 <7 PP , 5 o~p p , 15 p pp and no p pp points) 
with Xmin changing from 1185.6 to 182.8. We find Xmin/ 11 = 0-96, which when renormalized using eq. JSJ for 
A = 6 becomes 7Z x Xmin/ 1 ' = 1-067, a very likely value with a probability of w 0.25. 

Obviously, we have cleaned up the sample — we have rejected 25 datum points which had an average 
Axf ~ 40! We have demonstrated that: (1) the goodness-of-fit of the model is excellent, and (2) we had 
very large Axf contributions from the outliers that we were able to Sieve out. These outliers, in addition 
to giving a huge Xmin/^j severely distort the parameters found in a conventional x 2 minimization, whereas 
they were easily handled by a robust fit which minimized Aq, followed by a x 2 fit to the sifted data. 

Inspection of Table El shows that the parameter values c\ , C2 and /+ (0) effectively do not depend on 
Axf max , our cut-off choice, having only very small changes compared to the predicted parameter errors. 

A further indication of the stability of the Sieve is illustrated in Table H As a function of y/s, we have 
tabulated: 

• the predicted total cross sections and p-values for pp and pp 

• the errors in their predictions generated by the errors in the fit parameters c\, ci and /+(0), 

for two different cut-off values, Ay? „„ = 4 and 6. The predicted cross sections and p-values for the two 

' ' v t III ciX 

values of Ax\ are virtually indistinguishable, giving us strong confidence in the Sieve technique when 
used with four different types of real-world experimental data. 

The results of applying the Sieve algorithm to the 4 data sets, along with the fitted curves, are graphically 
shown in Fig. 1 101 for o~ pp and o pp and in Fig. II II for p pp and p pp . The total number of data points shown in 
Fig. 1 101 and in Fig. ^]is 193, whereas we started with 218 points. The fits shown are in excellent agreement 
with the 193 data points. 

As a final test, we tried fitting another model which had its cross section energy dependence asymptotically 
rising as Ins. This is the equivalent of setting the parameter ci = 0, leaving us two free parameters to fit, c\ 
and /+(0). Using the same sifted data set which had given x m in = 182.8 for the In 2 s model we now obtained 
Xmin = H85.6 for only one more degree of freedom, clearly indicating that the Ins model was a very bad fit 
and could be excluded, whereas the In 2 s model gave a very good fit to the same data subset. 

3.7.2 7r ~p and ir + p scattering 

The raw experimental data for ir~p and ir + p scattering shown in Figures|3and01were taken from the Particle 
Data Group£Q. For E cms > 6 GeV, Figure shows the cr v - p and ov+ p data and Fig. 0]shows the p^- p and 
Ptt+p data. There are a total of 155 points in these 4 data sets. Before we applied the Sieve algorithm, we 
obtained x 2 = 527.8, whereas we expected 152, leading us to conclude that either the model doesn't work 
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or there are a substantial number of outliers giving very large Axf contributions. Once again, the Sieve 
technique shows the latter to be the case. 

Tabic 0] contains the fitted results for ir~p and n + p scattering using 3 different choices of the cut-off, 
Ax 2 max = 4, 6 and 9. For each Ax 2 max it tabulates: 

• the fitted parameters from the y 2 fit together with the errors found in the \ 2 n t> 

• the total Xmin> 

• v, the number of degrees of freedom (d.f.) after the data have been sifted by the indicated Ax 2 max 
cut-off. 

To get robust errors, the errors quoted in Table 21 for Axf(xi;a) — 6 for each parameter should be 
multiplied by the common factor r x a=1.05 of eq. (0J for the cut A = 6. 

For Ax? max = 6, the number of retained data points is 130, whereas we started with 155, a background of 
~ 19%. We have rejected 25 outlier points (2 a n + p , 19 o^- pi 4 p^+ p and no p n - p points) with Xmin changing 
from 527.8 to 148.1. We find Xmin/^ = 1-166, which when renormalized using eq. JSJ for A = 6 becomes 
TZ x Xmm/ U = 1-^6, corresponding to a probability of 0.03, which is acceptable being about a 2er effect. 

Again, we have cleaned up the sample. We have rejected 25 datum points which had an average A%| ~ 15. 
We have demonstrated that: (1) the model works, and (2) we had large A%| contributions from the outliers 
that we were able to Sieve out. 

Inspection of Tabled shows that the parameter values effectively do not depend on our choice of cut-off, 
A%| , not changing significantly compared to the predicted parameter errors. Another and perhaps better 
indication of the stability of the Sieve is illustrated in Table Tabulated as a function of y/s are: 

• the predicted total cross sections and p-values for ir~p and Tr + p 

• the errors in their predictions generated by the errors in the fit parameters Ci, ci and /+(0) 

for two different values of the cut-off, Ax 2 max = 4 and A%f max = 6. The predicted cross sections and p 
values for the two values of Axf max are essentially indistinguishable, again generating strong confidence in 
the Sieve technique when used with these four different examples of real-world experimental data. 

The results of applying the Sieve algorithm to the 4 data sets, along with the fitted curves, are graphically 
shown in Fig. 1121 for <J^~ p and u^+ p and in Fig. 1131 for p^- p and p^+p. The fits shown are in reasonable 
agreement with the 155 data points retained by the Sieve. 

Again, when we attempted to fit the sifted data set of 130 points with a Ins fit, we found X m in = 942.5, 
with v = 128, giving x 2 l v — 7.35, with a probability of << 10~ 45 . Thus, again a In 2 s fits well and a Ins fit 
is ruled out for the np system. 

4 Comments and conclusions 

We have shown that the Sieve algorithm works well in the case of backgrounds in the range of to ~ 40%, 
i.e., for extensive computer data that were generated about a straight line, as well as about a constant, 
and for a single event with a 20% outlier contamination as well as a 13%"inlier" contamination, that was 
generated about a parabola. It also works well for the ~ 13% to 19% contamination for the eight real-world 
data sets taken from the Particle Data Group 1 . However, the Sieve algorithm is clearly inapplicable in the 
situation where the outliers (noise) swamps the signal. In that case, nothing can be done. 

There are many possible choices for distributions resulting in robust fits. Our particular choice of mini- 
mizing the Lorentzian squared, Aq(o; x) = In |l + 0.18A% 2 (xi; a)}, in order to extract the parameters 
{ai, . . . , aju} needed to apply our Sieve technique seems to be a sensible one for both artificial computer- 
generated noisy distributions, as well as for real- world experimental data. This statement should not be 
interpreted as meaning that real- world data is truly well- approximated as a Lorentz distribution, but rather, 
as demonstrating that using the Lorentz distribution to get rid of outliers without sensibly affecting the fit pa- 
rameters works well in the real world. Next, the choice of filtering out all points with Axf > Ax\ max — where 
^Xi max ^ s as l ar S e as possible — is optimal in both minimizing the loss of good data and maximizing the loss 
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of outliers, resulting in a renormalizcd 1Z x X 2 a xal v ~ 1 f° r both the computer-generated and the real-world 
sample, as well as minimizing the distribution widths, and thus, the errors assigned to the parameters. 
In detail, the utilization of the "Sieved" sample with Ay, 2 < Ay-„„„ allows one to 

• use the unbiased parameter values found in a % 2 fit to the truncated sample for the cut A^;?(a;,; £*) max , 
even in the presence of considerable background. 

• find the renormalizcd Xmin/^j *- e -> ^ x Xmin/^i where 1Z is the inverse of the factor given in eq. J5j as 
a function of A = A% 2 (xi; a) max and plotted in FigureEl 

• use the renormalized Xmin/^ *° estimate the goodness-of-fit of the model employing the standard \ 2 
probability distribution function. We thus estimate the probability that the data set fits the model, 
allowing one to decide whether to accept or reject the model. 

• make a robust evaluation of the parameter errors and their correlations, by multiplying the standard 
covariance matrix C found in the \ 2 n t by the appropriate value of (r x 2) 2 for the cut Axf max - The 
value of r x 2 is given by eq. (0J and shown in Figure [3] as a function of the cut Axf max , where it is 
called A. It ranges from 1 for very large A to ~ 1.14 for A = 2 in eq. J2J. However, this is not the 
complete story. The parameter error is a = r % 2 x £ and we must also take into account the increase in 
£ due to the cut A, which causes the loss of signal points. As shown in Table^ an d discussed in detail 
in Section ET41 the true loss of accuracy at A = 2 — relative to an unsifted sample of signal data — is the 
factor ~ 1.25. Thus, the algorithm starts failing rapidly for cuts A smaller than 2. 

In conclusion, the " Sieve" algorithm gains its strength from the combination of making first a A 2 , fit to 
get rid of the outliers and then a x 2 fit to the sifted data set. By varying the Ax 2 (xi; ") max to suit the 
data set needs, we easily adapt to the different contaminations of outliers that can be present in real-world 
experimental data samples. 

Not only do we now have a robust goodness-of-fit estimate, but we also have also a robust estimate of the 
parameters and, equally important, a robust estimate of their errors and correlations. The phenomenologist 
can now eliminate the use of possible personal bias and guesswork in "cleaning up" a large data set. 
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A Robust Estimation 

The terminology, "robust" statistical estimators was first introduced to deal with small numbers of data 
points which have a large departure from the model predictions, i.e., outlier points. Later, research on 
robust estimation [HIE] based on influence functions was carried out. More recently, robust estimations using 
regression models [TTI] were made — these are inadequate for fitting non-linear models which often are needed 
in practical applications. For example, the fit needed for eq. (JSJ is a non-linear function of the coefficients 
Co, ci, C2, . . ., since it is the ratio of two linear functions. We will discuss one possible technique for handling 
outlier points in a non-linear fit when we introduce the Lorentz probability density function in Section |A. 41 

A.l Maximum Likelihood Estimates 

Let Pi be the probability density of the zth individual measurement, i = 1, . . . , A, in the interval Ay. Then 
the probability of the total data set is 



N 

V = Y[P t Ay. 



(10) 



Let us define the quantity 



Vi - y{xi\a) 



(11) 



where yi is the measured value at Xi, y(xf,a) is the expected (theoretical) value from the model under 
consideration, and <7j is the experimental error of the zth measurement. The M model parameters ak are 
given by the M-dimensional vector a — {a\, . . . , aj/}. 

V is identified as the likelihood function, which we shall maximize as a function of the parameters 
a = {ax, . . . , a M }- 

For the special case where the errors are normally distributed (Gaussian distribution), we have the 
likelihood function V given as 
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v = n i ex p 



\(vi- y{xi]cx) 
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Maximizing the likelihood function V in eq. (|12|l is the same as minimizing the negative logarithm of V, 
namely, 
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1 (Vi- y{xi;a) 



AT In 



Ay 



27TCT,' 



Since N , Ay and <Ji are constants, after using eq. this is equivalent to minimizing the quantity 



1 N 



(13) 
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We now define x 2 ( a ; x ) as 

JV 



X 



i=l 



where x = {xi, . . . , Xi, . . . , Xn}- 

Hence, the \ 2 minimization problem, appropriate to the Gaussian distribution, reduces to 



N 



minimize over a, % 2 (a; x) = Axi(xi] a) 



(15) 



(16) 



for the set of N experimental points at Xi having the value yi and error <Ji 



A. 2 Gaussian Distribution 

To minimize \ 2 > we must solve the (in general, non-linear) set of M equations 
1_ / yj - y{x l ;a) \ ( dy(x l ; ...aj...) 

The Gaussian distribution allows a x 2 minimization routine to return several exceedingly useful statistical 
quantities. Firstly, it returns the best-fit parameter space a mm . Secondly, the value of Xmim wnen compared 
to the number of degrees of freedom ( d.f.= v = N — M, the number of data points minus the number 
of fitted parameters) allows one to make standard estimates of the goodness of the fit of the data set to 
the model used, using the \ 2 probability distribution function, given in standard texts 0, for v degrees of 
freedom. Further, C _1 , the M X M matrix of the partial derivatives at the minimum, given by 




= 0, j = l,. 



(17) 
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allows us to compute the standard covariance matrix C for the individual parameters oti, as well as the 
correlations between aj and afc0. Thus, when the errors are distributed normally, the \ 2 technique not 
only gives us the desired parameters e* m in, but also furnishes us with statistically meaningful error estimates 
of the fitted parameters, along with goodness-of-fit information for the data to the chosen model — very 
valuable quantities for any model under consideration. 



A. 3 Robust Distributions 

We can generalize the maximum likelihood function of eq. I|12[) , which is a function of the variable y, -~ y ( x " a } , 



A ' 1 / yi-y(xi;a) 



i=l 



exp 



Ay , (19) 



where the function p ^ — U^±<£Q. J j s the negative logarithm of the probability density. Note that the sta- 
tistical function p used in this Appendix has nothing to do with the p- value used in eq. Thus, we now 
have to minimize the generalization of eq. (|14|l . i.e., 

minimize over a, > pi I , (21) J 

i=i ^ * ' 

for the iV-dimensional vector x. 

This yields the more general set of M equations 

f:^( yi -f i;a) )( dy{X %^--- ) )=0, J 1 M- (21) 

i— 1 ^ \ / \ J / 

where the influence function ip(z) in eq. (|21[1 is given by 

_ df3(z) _ yt - y(xi;a) 



tA(z) = ^-^, z= yl » y > = signfa - y( Xi ; a)) x JA^faa). (22) 
dz Oi v ' 

Comparison of eq. (|21|l with the Gaussian equivalent of eq. (|17|) shows that 

p{z) = 2 z2 ' ^1 ) { Z ) — z (f° r a Gaussian distribution). (23) 

We note that for a Gaussian distribution, the influence function w(z) for each experimental point i is 
proportional to \J Axf, the normalized departure of the point from the theoretical value. Thus, the more 
the departure from the theoretical value, the more "influence" the point has in minimizing x 2 - This gives 
outliers (points with large departures from their theoretical values) unduly large "influence" in computing 
the best vector a, easily skewing the answer due to the inclusion of these outliers. 



A. 4 Lorentz Distribution 

Consider the normalized Lorentz probability density distribution (also known as the Cauchy distribution or 
the Breit-Wigner line width distribution), given by 

p W = ~TT~ — 2' ( 24 ) 

7T 1 + "fZ Z 

where 7 is a constant whose significance will be discussed later. Using eq. I|ll|) and eq. i|22|) . we rewrite 
eq. Q24fl in terms of the measurement errors Oi and the experimental measurements yi at Xi as 



P 



y l -y(a; i ;a)\ _ ^7 



I + 7 V' ' ) 

V7 I 
7T l + ^Axf{xi;a)' 



(25) 
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It has long tails and therefore is more suitable for robust fits than is the Gaussian distribution. Taking the 
negative logarithm of eq. (|25[) and using it in eq. 1|20[) . we see that 

p(z) = ln(l + 7 z 2 ) = ln{l + 7Ax-(xj;a)} and 

= ^_ = si ^y* - gfog))* y/MER. (26) 

' l+ 7 z 2 l + 7 Ax ! 2 (i ! ;a) 

In analogy to x 2 minimization, we must now minimize A 2 (a; a;), the Lorentzian squared, with respect to 
the parameters a, for a given set of experimental points x, i.e., 

N 

minimize over a, A 2 (a; x) = In {l + ~fAxi(xi; a)} , (27) 

2=1 

for the set of N experimental points at Xi having the value yi and error <7j . 

We have made extensive computer simulations using Gaussianly generated data (constant and straight line 
models) which showed empirically that the choice 7 = 0.18 minimized the rms (root mean square) parameter 
widths found in A 2 minimization. Further, it gave rms widths that were almost as narrow as those found in 
X 2 minimization on the same data. We will adopt this value of 7, since it effectively minimizes the width 
for the A 2 routine, which we now call A 2 , (a; x). Thus we select for our robust algorithm, 

N 

minimize over a, A„(a; x) = In {l + 0.18Ax 2 (£i; a)} . (28) 

2=1 

An important property of A 2 , (a; x) is that it numerically gives the same total Xo ■ as that found in a x 2 
fit, i.e. Xo = J2iLi A-Xi ( x i] a ): where the A%?(xi;a) come from the minimization of A 2 , in eq. (|28|l . is the 
same as the Xmin found using a standard x 2 minimization on the same data. 

We note from eq. I)26|) that the influence function for a point i for small \J A% 2 increases proportional to 
^Axf (just like the G aussian distribution does), whereas for large y A% 2 , it decreases as 1/yAx?. Thus, 
large outliers have much less "influence" on the fit than do points close to the model curve — this feature 
makes A 2 minimization robust. Thus, outliers have little influence on the choice of the parameters a min 
resulting from the minimization of A 2 ,, a major consideration for a robust minimization method. 

Unlike the minimization of x 2 5 the minimization of A 2 ,, while yielding the desired robust estimate of 
a m i n , gives neither parameter error information on a m j n nor a conventional goodness-of-fit. These are major 
failings, since one has no objective grounds for accepting or rejecting the model. We will rectify these 
shortcomings in the main section of the text, Section where we describe the adaptive "Sieve" algorithm. 
Extensive computer studies, summarized in Section [3.41 demonstrate that use of this algorithm enables one 
to make a robust error estimate of a m j n , as well as a robust estimate of the goodness-of-fit of the data to the 
model. 
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Ay 2 — 9 

At max 


Ay 2 = 6 

At max 


Ay 2 = 4 

At max 


^Xi max ^ 


J*X 2 ,str. line 


1.034 


1.054 


1.098 


1.162 


^*X 2 , constant 


1.00 


1.05 


1.088 


1.108 


average 


1.018 


1.052 


1.093 


1.148 


< Xmin > /" 

str. line 


0.974 


0.901 


0.774 


0.508 


constant 


0.973 


0.902 


0.774 


0.507 


average 


0.973 


0.901 


0.774 


0.507 


n- 1 


0.9733 


0.9013 


0.7737 


0.5074 


S.F. 


0.9973 


0.9857 


0.9545 


0.8427 


cr/<T 


1.02 


1.06 


1.19 


1.25 



Table 1: Results for r x i — cr/E, the ratio of the rms width to E, the error for the \ 2 fit; < Xmin > / v > f°r both the 
straight line case and the constant case; a/uo, the ratio of the rms width (error) of the parameter relative to what 
the error would be if the sample were not truncated, i.e., the total loss of accuracy due to truncation, as functions 
of the cut Ax? max - The average results for r x 2 and < Xmin > l v are graphically shown in Fig. |U] See Sections 13.21 
13.31 and 13.41 for details. The theoretical values for the renormalization factor are from eq. @ and the survival 
fractions S.F. are from eq. ©. See Section f3. 41 for a discussion of the error- broadening factor a/ao- 



Fitted 


Ay 2 

At max 


Parameters 


4 


6 


9 


ci (mb) 


-1.452 ±0.066 


-1.448 ±0.066 


-1.423 ±0.065 


c 2 (mb) 


0.2828 ±0.0061 


0.2825 ±0.0060 


0.2801 ±0.0059 


/(0) (mb GeV) 


-0.065 ±0.56 


-0.020 ±0.56 


-0.065 ±0.56 


x 2 ■ 

A mm 


142.8 


182.8 


217.9 


V (di). 


182 


190 


195 


ft X Xmin/f 


1.014 


1.067 


1.143 



Table 2: The fitted results for a 3-parameter fit to the total cross sections and p- values for pp and pp scattering. The 
renormalized xV^min, taking into account the effects of the Axi max cut, is given in the row labeled TZ x Xmin/^- 
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(GeV) 




Ax? 


= 4 

max 




Ay 2 = 6 

Ai max 


Predicted Error 


Cpp 


(Jpp 


ppp 


Ppp 


@pp 


(Jpp 


Ppp 


Ppp 




(Jpp 


Ppp 


Ppp 


10 


43.77 


38.34 


-0.0368 


-0.1501 


43.77 


38.33 


-0.0365 


-0.1498 


.01 


.01 


.003 


.004 


100 


46.61 


46.25 


0.1083 


0.1031 


46.61 


46.25 


0.1082 


0.1031 


.08 


.08 


.001 


.001 


540 


60.87 


60.82 


0.1368 


0.1363 


60.86 


60.81 


0.1367 


0.1362 


.28 


.28 


.001 


.001 


1800 


75.30 


75.29 


0.1396 


0.1395 


75.28 


75.27 


0.1396 


0.1395 


.50 


.50 


.001 


.001 


14000 


107.6 


107.6 


0.1318 


0.1318 


107.5 


107.5 


0.1318 


0.1318 


1.0 


1.0 


.001 


.001 



Table 3: The predicted results for a Pp , a pp , p Pp and p pp , together with their errors, as a function of y/s, the cms 
energy in GeV, for AyJ max = 4 and Ax; max = 6 . The cross sections and their errors are in mb. The predicted 
errors are those found from a standard \ 2 analysis. 



Fitted 


Ay 2 

A « max 


Parameters 


4 


6 


9 


ci (mb) 


-0.895 ±0.11 


-0.921 ±0.11 


-0.982 ±0.10 


c 2 (mb) 


0.174 ±0.0083 


0.177 ±0.0081 


0.182 ±0.0075 


/(0) (mb GeV) 


-2.281 ±0.34 


-2. 307 ±0.34 


-2.327 ±0.34 


y 2 • 

A mm 


128.7 


148.1 


204.4 


V (di). 


122 


127 


135 


ft X Xmin/^ 


1.364 


1.293 


1.556 



Table 4: The fitted results for a 3-parameter fit to the total cross sections and p-values for 7r + p and 7r p scattering. 
The renormalized xV^min, taking into account the effects of the Ax? max cut, is given in the row labeled 1Z x Xmin/^- 



(GeV) 




Ax? 


= 4 

max 






Ax? 


max ^ 




Predicted Error 






Ptt~ p 


Pn+p 


p 


°V+p 


Ptt~ p 


PlT + p 


p 




Pit p 


P-ff + p 


6 


25.40 


23.70 


-0.1391 


-0.2704 


25.40 


23.70 


-0.1396 


-0.2708 


.01 


.01 


.009 


.010 


15 


24.26 


23.35 


0.0392 


-0.0248 


24.27 


23.36 


0.0396 


-0.0243 


.01 


.01 


.002 


.002 


23.5 


24.92 


24.25 


0.0827 


0.0385 


24.94 


24.27 


0.0833 


0.0393 


.02 


.02 


.002 


.002 


62.5 


28.15 


27.81 


0.1309 


0.1117 


28.20 


27.86 


0.1318 


0.1127 


.09 


.09 


.003 


.003 



Table 5: The predicted results for cr n - p , u^ +p , p^- p and p^+ p , together with their errors, as a function of -/s, the 
cms energy in GeV, for Ax? max = 4 and Ax? max = 6 . The cross sections and their errors are in mb. The predicted 
errors are those found from a standard \ 2 analysis. 
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Figure 1: The data points shown are all of the experimental data listed in the Particle Data Group 1 site for pp and pp total 
cross sections in the energy interval E cms > 6 GeV. The open circles are ap p and the squares are a vv . 




Figure 2: The data points shown are all of the experimental data listed in the Particle Data GroupJTf site for pp and pp 
p-values (ratio of the real to the imaginary portion of the forward scattering amplitude) in the energy interval E cms > 6 GeV. 
The open circles are pp p and the squares are p pp . 
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Figure 3: The data points shown are all of the experimental data listed in the Particle Data Group^ sl ^e for n p and 7r + p 
total cross sections in the energy interval E C ms > 6 GeV. The open circles are a^— p and the squares are cr^.+ p . 
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Figure 4: The data points shown are all of the experimental data listed in the Particle Data Group0 site for n~p and 7r+p 
p-values (ratio of the real to the imaginary portion of the forward scattering amplitude) in the energy interval E cms > 6 GeV. 
The open circles are p n + p and the squares are p^- p - 
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a) 
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gaussian data, with Ax { < 6 
best fit, y = 0.998 -2.014x 
fit using y 2 routine on all data, 
including outliers 



b) 



Figure 5: a) The 100 squares are a computer-generated Gaussianly distributed data set about the straight line y = 1 — 2x. 
The 20 open circles are randomly distributed noise data. See Section l3.1l for details. 

b) The 100 data points shown are the result of screening all 120 data points for those points having A\ 2 < 6. There were no 
noise points (open circles) retained in the Sieve and the 100 squares are the Gaussian data retained in the Sieve. The best fit 
curve to all points with A\ 2 < 6, y = a + bx, is the solid curve, where a = 0.998±0.12, 6 = -2.014±0.020, and X 2 • A* = °- 91 ' 
yielding a renormalized value TZ X Xmin/^ = compared to the expected < \ 2 > jv = 1.0 ± 0.14. The dashed-dot curve is 
a x 2 fit to the totality of data — 100 signal plus 20 noise points — which has \' 2 ■ l v = 4.8. 
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b) 



Figure 6: a) The 100 squares are a computer-generated Gaussianly distributed data set about the constant y = 10. The 40 
open circles are randomly distributed noise data. See Section 13.31 for details. 

b) The 98 data points shown are the result of screening all 140 data points for those points having Ax 2 < 4. There were no 
noise points (open circles) retained in the Sieve and the 98 squares are the Gaussian data retained in the Sieve. The best fit 
curve to all points with A\ 2 < 4, y = c, is the solid curve, where c = 9.98 ± 0.074, and X^in/ 17 = 0-84, yielding a renormalized 
value 7Z X Xmin/^ = compared to the expected < x 2 > /" = 1-0 ± 0.14. The dashed-dot curve is a x 2 fit to the totality of 
data — 100 signal plus 40 noise points — which has Xmm/^ = 4-39. 
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Figure 7: a) The 100 squares are a computer-generated Gaussianly distributed data set about the constant y = 10. The 40 
open circles are randomly distributed noise data. See Section 13.31 for details. 

b) The 99 data points shown are the result of screening all 140 data points for those points having Ax 2 < 9. There were no 
noise points (open circles) retained in the Sieve and the 98 squares are the Gaussian data retained in the Sieve. The best fit 
curve to all points with Ax 2 < 9, y = c, is the solid curve, where c = 10.05 ±0.074, and X 2 n i I ,A / = 1-08, yielding a renormalized 
value 7Z X Xmin/ U = I'll compared to the expected < x 2 > jv = 1-0 ± 0.14. The dashed-dot curve is a x 2 fit to the totality of 
data — 100 signal plus 40 noise points — which has Xmin/ U = 8-10. 
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Fi gure 8: a) The 100 squares are a computer-generated Gaussianly distributed data set about the parabola y = l + 2:c + 0.5:r 2 . 
The 35 open circles are randomly distributed noise data around the parabola y = 12 + 2x + 0.2x 2 . See Section l3.6l for details, 
b) The 113 data points shown are the result of screening all of the data for those points having Ax 2 < 6. The open circles are 
the 13 noise points retained in the Sieve and the 100 squares are the Gaussian data retained in the Sieve. The best fit curve to 
all points with Ax 2 < 6, y = 1.23 + 2.04x + 0.48x 2 , is the solid curve. The dashed curve is a x 2 fit to the totality of data in 
Fig. [5] consisting of signal plus noise. 
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A% : CUt 



Figure 9: a) A plot of eq. TZ -1 , the reciprocal of the factor that multiplies X 2 n i n / I/ found in the x 2 fit to the sifted data 
set vs. Ax 2 cu t, the Ax? m cut. b) A plot of eq. J4}: r x 2 , the factor whose square multiplies the covariant matrix found in 
the x 2 fit to the sifted data set vs. Ax?, the x 2 cut. See Sections 13.21 13.31 and for details. In eq. Jl) and eq. JSJ, the Ax? 
cut is called A. 
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Fi gUI6 10; The data points shown are the result of screening all of the points of Fig. 0for those cross section points with 
Axf < 6. The open circles are ap p and the squares are a pv . The solid line is the theoretical fit to ap p and the dashed line is 
the theoretical fit to cr„„. 




Fi gUX6 11* The data points shown are the result of screening all of the points in Fig. [^jfor those p- value points with < 6. 
The open circles are p pp and the squares are p pp . The solid line is the theoretical fit to pp p and the dashed line is the theoretical 
fit to p PP . 
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Fi gUX6 12; The data points shown are the result of screening all of the points of Fig. |^jfor those cross section points with 
Ax 2 < 6. The open circles are o n - p and the squares are <r w + p - The solid line is the theoretical fit to cr n - p and the dashed line 
is the theoretical fit to cr^+ p . 
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Figure 13: The data points shown are the result of screening all of the points in Fig. [I]for those p- value points with < 6. 
The open circles are p n - p and the squares are p n + p - The solid line is the theoretical fit to p^- p and the dashed line is the 
theoretical fit to p^+ p - 
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